Overview

Dataset Statistics

Number of Variables 24
Number of Rows 844338
Missing Cells 1.6887e+06
Missing Cells (%) 8.3%
Duplicate Rows 0
Duplicate Rows (%) 0.0%
Total Size in Memory 245.3 MB
Average Row Size in Memory 304.6 B
Variable Types
  • Numerical: 10
  • Categorical: 12
  • DateTime: 2

Dataset Insights

assortment has 844338 (100.0%) missing values Missing
state_holiday has 844338 (100.0%) missing values Missing
competition_distance is skewed Skewed
competition_open_since_year is skewed Skewed
sales is skewed Skewed
competition_time_month is skewed Skewed
promo_time_week is skewed Skewed
store_type has constant length 1 Constant Length
promo2 has constant length 1 Constant Length
promo2_since_year has constant length 4 Constant Length
day_of_week has constant length 1 Constant Length
promo has constant length 1 Constant Length
school_holiday has constant length 1 Constant Length
is_promo has constant length 1 Constant Length
year has constant length 4 Constant Length
dayofweek has constant length 1 Constant Length
assortment has all distinct values Unique
state_holiday has all distinct values Unique
competition_time_month has 70101 (8.3%) negatives Negatives
promo_time_week has 57241 (6.78%) negatives Negatives
competition_time_month has 268025 (31.74%) zeros Zeros
promo_time_week has 421646 (49.94%) zeros Zeros
  • 1
  • 2
  • 3

Variables

store

numerical

Approximate Distinct Count 1115
Approximate Unique (%) 0.1%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 12.9 MB
Mean 558.4214
Minimum 1
Maximum 1115
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • store is skewed right (γ1 = 0.0004)

Quantile Statistics

Minimum 1
5-th Percentile 56
Q1 280
Median 558
Q3 837
95-th Percentile 1060
Maximum 1115
Range 1114
IQR 557

Descriptive Statistics

Mean 558.4214
Standard Deviation 321.7309
Variance 103510.7472
Sum 4.715e+08
Skewness 0.00042588
Kurtosis -1.1988
Coefficient of Variation 0.5761

store_type

categorical

Approximate Distinct Count 4
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 53.1 MB
  • The largest value (a) is over 1.77 times larger than the second largest value (d)

Length

Mean 1
Standard Deviation 0
Median 1
Minimum 1
Maximum 1

Sample

1st row c
2nd row c
3rd row c
4th row c
5th row c

Letter

Count 844338
Lowercase Letter 844338
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 0
  • The top 2 categories (a, d) take over 50.0%
  • The largest value (c) is over 7.26 times larger than the second largest value (b)
  • store_type has words of constant length

assortment

categorical

Approximate Distinct Count 1
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 54.8 MB

Length

Mean 3
Standard Deviation 0
Median 3
Minimum 3
Maximum 3

Sample

1st row nan
2nd row nan
3rd row nan
4th row nan
5th row nan

Letter

Count 2533014
Lowercase Letter 2533014
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 0
  • assortment has words of constant length

competition_distance

numerical

Approximate Distinct Count 655
Approximate Unique (%) 0.1%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 12.9 MB
Mean 5961.8275
Minimum 20
Maximum 200000
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • competition_distance is skewed right (γ1 = 10.1349)

Quantile Statistics

Minimum 20
5-th Percentile 140
Q1 720
Median 2340
Q3 6930
95-th Percentile 21790
Maximum 200000
Range 199980
IQR 6210

Descriptive Statistics

Mean 5961.8275
Standard Deviation 12592.1811
Variance 1.5856e+08
Sum 5.0338e+09
Skewness 10.1349
Kurtosis 145.2878
Coefficient of Variation 2.1121
  • competition_distance is not normally distributed (p-value 1.665501592739637e-23)
  • competition_distance has 82882 outliers

competition_open_since_month

numerical

Approximate Distinct Count 12
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 12.9 MB
Mean 6.7874
Minimum 1
Maximum 12
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • competition_open_since_month is skewed left (γ1 = -0.0485)

Quantile Statistics

Minimum 1
5-th Percentile 2
Q1 4
Median 7
Q3 10
95-th Percentile 12
Maximum 12
Range 11
IQR 6

Descriptive Statistics

Mean 6.7874
Standard Deviation 3.3099
Variance 10.9555
Sum 5.7308e+06
Skewness -0.04845
Kurtosis -1.2319
Coefficient of Variation 0.4877
  • competition_open_since_month is not normally distributed (p-value 9.473822947112525e-05)

competition_open_since_year

numerical

Approximate Distinct Count 23
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 12.9 MB
Mean 2010.3311
Minimum 1900
Maximum 2015
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • competition_open_since_year is skewed left (γ1 = -7.2173)

Quantile Statistics

Minimum 1900
5-th Percentile 2002
Q1 2008
Median 2012
Q3 2014
95-th Percentile 2015
Maximum 2015
Range 115
IQR 6

Descriptive Statistics

Mean 2010.3311
Standard Deviation 5.5026
Variance 30.2789
Sum 1.6974e+09
Skewness -7.2173
Kurtosis 123.9023
Coefficient of Variation 0.002737
  • competition_open_since_year is not normally distributed (p-value 2.7953881706506847e-21)
  • competition_open_since_year has 9008 outliers

promo2

categorical

Approximate Distinct Count 2
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 53.1 MB

Length

Mean 1
Standard Deviation 0
Median 1
Minimum 1
Maximum 1

Sample

1st row 0
2nd row 0
3rd row 0
4th row 0
5th row 0

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 844338
  • The top 2 categories (0, 1) take over 50.0%
  • promo2 has words of constant length

promo2_since_week

numerical

Approximate Distinct Count 52
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 12.9 MB
Mean 23.6291
Minimum 1
Maximum 52
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • promo2_since_week is skewed right (γ1 = 0.1704)

Quantile Statistics

Minimum 1
5-th Percentile 3
Q1 12
Median 22
Q3 37
95-th Percentile 47
Maximum 52
Range 51
IQR 25

Descriptive Statistics

Mean 23.6291
Standard Deviation 14.2883
Variance 204.1559
Sum 1.9951e+07
Skewness 0.1704
Kurtosis -1.1948
Coefficient of Variation 0.6047
  • promo2_since_week is not normally distributed (p-value 2.3260243980759022e-06)

promo2_since_year

categorical

Approximate Distinct Count 7
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 55.6 MB

Length

Mean 4
Standard Deviation 0
Median 4
Minimum 4
Maximum 4

Sample

1st row 2015
2nd row 2015
3rd row 2015
4th row 2015
5th row 2015

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 3377352
  • The top 2 categories (2013, 2014) take over 50.0%
  • promo2_since_year has words of constant length

day_of_week

categorical

Approximate Distinct Count 7
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 53.1 MB

Length

Mean 1
Standard Deviation 0
Median 1
Minimum 1
Maximum 1

Sample

1st row 5
2nd row 4
3rd row 3
4th row 2
5th row 1

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 844338
  • day_of_week has words of constant length

date

datetime

Distinct Count 941.7339
Approximate Unique (%) 0.1%
Missing 0
Missing (%) 0.0%
Memory Size 12.9 MB
Minimum 2013-01-01 00:00:00
Maximum 2015-07-31 00:00:00

sales

numerical

Approximate Distinct Count 21733
Approximate Unique (%) 2.6%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 12.9 MB
Mean 6955.9591
Minimum 46
Maximum 41551
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • sales is skewed right (γ1 = 1.5949)

Quantile Statistics

Minimum 46
5-th Percentile 3240
Q1 4877
Median 6397
Q3 8388
95-th Percentile 12905
Maximum 41551
Range 41505
IQR 3511

Descriptive Statistics

Mean 6955.9591
Standard Deviation 3103.8155
Variance 9.6337e+06
Sum 5.8732e+09
Skewness 1.5949
Kurtosis 4.854
Coefficient of Variation 0.4462
  • sales is not normally distributed (p-value 3.6500779321355204e-07)
  • sales has 30353 outliers

promo

categorical

Approximate Distinct Count 2
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 53.1 MB

Length

Mean 1
Standard Deviation 0
Median 1
Minimum 1
Maximum 1

Sample

1st row 1
2nd row 1
3rd row 1
4th row 1
5th row 1

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 844338
  • The top 2 categories (0, 1) take over 50.0%
  • promo has words of constant length

state_holiday

categorical

Approximate Distinct Count 1
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 54.8 MB

Length

Mean 3
Standard Deviation 0
Median 3
Minimum 3
Maximum 3

Sample

1st row nan
2nd row nan
3rd row nan
4th row nan
5th row nan

Letter

Count 2533014
Lowercase Letter 2533014
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 0
  • state_holiday has words of constant length

school_holiday

categorical

Approximate Distinct Count 2
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 53.1 MB
  • The largest value (0) is over 4.17 times larger than the second largest value (1)

Length

Mean 1
Standard Deviation 0
Median 1
Minimum 1
Maximum 1

Sample

1st row 1
2nd row 1
3rd row 1
4th row 1
5th row 1

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 844338
  • The top 2 categories (0, 1) take over 50.0%
  • The largest value (0) is over 4.17 times larger than the second largest value (1)
  • school_holiday has words of constant length

is_promo

categorical

Approximate Distinct Count 2
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 53.1 MB
  • The largest value (0) is over 5.45 times larger than the second largest value (1)

Length

Mean 1
Standard Deviation 0
Median 1
Minimum 1
Maximum 1

Sample

1st row 0
2nd row 0
3rd row 0
4th row 0
5th row 0

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 844338
  • The top 2 categories (0, 1) take over 50.0%
  • The largest value (0) is over 5.45 times larger than the second largest value (1)
  • is_promo has words of constant length

year

categorical

Approximate Distinct Count 3
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 55.6 MB

Length

Mean 4
Standard Deviation 0
Median 4
Minimum 4
Maximum 4

Sample

1st row 2015
2nd row 2015
3rd row 2015
4th row 2015
5th row 2015

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 3377352
  • The top 2 categories (2013, 2014) take over 50.0%
  • year has words of constant length

month

numerical

Approximate Distinct Count 12
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 12.9 MB
Mean 5.8458
Minimum 1
Maximum 12
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • month is skewed right (γ1 = 0.2577)

Quantile Statistics

Minimum 1
5-th Percentile 1
Q1 3
Median 6
Q3 8
95-th Percentile 12
Maximum 12
Range 11
IQR 5

Descriptive Statistics

Mean 5.8458
Standard Deviation 3.324
Variance 11.0487
Sum 4.9358e+06
Skewness 0.2577
Kurtosis -1.0332
Coefficient of Variation 0.5686
  • month is not normally distributed (p-value 0.0008894504510882016)

weekofyear

numerical

Approximate Distinct Count 52
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 12.9 MB
Mean 23.6469
Minimum 1
Maximum 52
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • weekofyear is skewed right (γ1 = 0.2623)

Quantile Statistics

Minimum 1
5-th Percentile 3
Q1 11
Median 23
Q3 35
95-th Percentile 49
Maximum 52
Range 51
IQR 24

Descriptive Statistics

Mean 23.6469
Standard Deviation 14.3899
Variance 207.0701
Sum 1.9966e+07
Skewness 0.2623
Kurtosis -1.0258
Coefficient of Variation 0.6085
  • weekofyear is not normally distributed (p-value 0.00019918034403648192)

dayofweek

categorical

Approximate Distinct Count 7
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 53.1 MB

Length

Mean 1
Standard Deviation 0
Median 1
Minimum 1
Maximum 1

Sample

1st row 4
2nd row 3
3rd row 2
4th row 1
5th row 0

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 844338
  • dayofweek has words of constant length

seasons

categorical

Approximate Distinct Count 4
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 56.9 MB

Length

Mean 5.6279
Standard Deviation 0.7783
Median 6
Minimum 4
Maximum 6

Sample

1st row summer
2nd row summer
3rd row summer
4th row summer
5th row summer

Letter

Count 4751878
Lowercase Letter 4751878
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 0
  • The top 2 categories (spring, winter) take over 50.0%

competition_time_month

numerical

Approximate Distinct Count 376
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 12.9 MB
Mean 41.6797
Minimum -32
Maximum 1407
Zeros 268025
Zeros (%) 31.7%
Negatives 70101
Negatives (%) 8.3%
  • competition_time_month is skewed right (γ1 = 7.3388)

Quantile Statistics

Minimum -32
5-th Percentile -6
Q1 0
Median 17
Q3 75
95-th Percentile 149
Maximum 1407
Range 1439
IQR 75

Descriptive Statistics

Mean 41.6797
Standard Deviation 66.8144
Variance 4464.1657
Sum 3.5192e+07
Skewness 7.3388
Kurtosis 126.8551
Coefficient of Variation 1.603
  • competition_time_month is not normally distributed (p-value 2.253766894301718e-21)
  • competition_time_month has 9741 outliers

promo_since

datetime

Distinct Count 167.2131
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 12.9 MB
Minimum 2009-07-27 00:00:00
Maximum 2015-07-27 00:00:00

promo_time_week

numerical

Approximate Distinct Count 440
Approximate Unique (%) 0.1%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 12.9 MB
Mean 54.4007
Minimum -126
Maximum 313
Zeros 421646
Zeros (%) 49.9%
Negatives 57241
Negatives (%) 6.8%
  • promo_time_week is skewed right (γ1 = 1.1034)

Quantile Statistics

Minimum -126
5-th Percentile -16
Q1 0
Median 0
Q3 111
95-th Percentile 233
Maximum 313
Range 439
IQR 111

Descriptive Statistics

Mean 54.4007
Standard Deviation 85.4576
Variance 7302.9944
Sum 4.5933e+07
Skewness 1.1034
Kurtosis 0.113
Coefficient of Variation 1.5709
  • promo_time_week is not normally distributed (p-value 5.844294879588795e-25)
  • promo_time_week has 11840 outliers

Interactions

Correlations

Missing Values